首页> 外文OA文献 >Full Speed Ahead : Detailed Architectural Simulation at Near-Native Speed
【2h】

Full Speed Ahead : Detailed Architectural Simulation at Near-Native Speed

机译:全速前进:以接近自然的速度进行详细的建筑仿真

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Popular microarchitecture simulators are typically several orders of magnitude slower than the systems they simulate. This leads to two problems: First, due to the slow simulation rate, simulation studies are usually limited to the first few billion instructions, which corresponds to less than 10% the execution time of many standard benchmarks. Since such studies only cover a small fraction of the applications, they run the risk of reporting unrepresentative application behavior unless sampling strategies are employed. Second, the high overhead of traditional simulators make them unsuitable for hardware/software co-design studies where rapid turn-around is required. In spite of previous efforts to parallelize simulators, most commonly used full-system simulations remain single threaded. In this paper, we explore a simple and effective way to parallelize sampling full-system simulators. In order to simulate at high speed, we need to be able to efficiently fast-forward between sample points. We demonstrate how hardware virtualization can be used to implement highly efficient fast-forwarding in the standard gem5 simulator and how this enables efficient execution between sample points. This extremely rapid fast-forwarding enables us to reach new sample points much quicker than a single sample can be simulated. Together with efficient copying of simulator state, this enables parallel execution of sample simulation. These techniques allow us to implement a highly scalable sampling simulator that exploits sample-level parallelism. We demonstrate how virtualization can be used to fast-forward simulators at 90% of native execution speed on average. Using virtualized fast-forwarding, we demonstrate a parallel sampling simulator that can be used to accurately estimate the IPC of standard workloads with an average error of 2.2% while still reaching an execution rate of 2.0 GIPS (63% of native) on average. We demonstrate that our parallelization strategy scales almost linearly and simulates one core at up to 93% of its native execution rate, 19,000x faster than detailed simulation, while using 8 cores.
机译:流行的微体系结构仿真器通常比其仿真系统慢几个数量级。这导致两个问题:首先,由于仿真速度慢,仿真研究通常仅限于前几十亿条指令,这相当于许多标准基准的执行时间不到10%。由于此类研究仅涵盖一小部分应用程序,因此除非采用抽样策略,否则冒着举报不具代表性的应用程序行为的风险。其次,传统模拟器的高开销使其不适用于需要快速周转的硬件/软件协同设计研究。尽管先前曾进行过并行化模拟器的工作,但最常用的全系统模拟仍是单线程的。在本文中,我们探索了一种简单有效的方法来并行化采样全系统模拟器。为了进行高速仿真,我们需要能够有效地在采样点之间快速前进。我们演示了如何在标准的gem5仿真器中使用硬件虚拟化来实现高效的快速转发,以及如何在示例点之间高效执行。这种极其快速的转发功能使我们能够比模拟单个样本更快地到达新的样本点。与有效复制模拟器状态一起,可以并行执行样本模拟。这些技术使我们能够实现高度可扩展的采样模拟器,该模拟器利用了样本级并行性。我们演示了如何将虚拟化平均用于本机执行速度平均为90%的快速转发模拟器。使用虚拟化的快速转发,我们演示了一个并行采样模拟器,该模拟器可用于准确估计标准工作负载的IPC,平均误差为2.2%,同时平均仍可达到2.0 GIPS(本机率为63%)的执行率。我们证明了我们的并行化策略几乎可以线性扩展,并且在使用8个内核的情况下,最多可以以其本机执行率的93%模拟一个内核,比详细模拟快19,000倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号